Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials
ثبت نشده
چکیده
2 Details of the pathological synthetic problems 3 2.1 The addition, multiplication, and XOR problem . . . . . . . . . . . . 3 2.2 The temporal order problem . . . . . . . . . . . . . . . . . . . . . . 4 2.3 The 3-bit temporal order problem . . . . . . . . . . . . . . . . . . . . 4 2.4 The random permutation problem . . . . . . . . . . . . . . . . . . . 4 2.5 Noiseless memorization . . . . . . . . . . . . . . . . . . . . . . . . . 5
منابع مشابه
Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks
Multidimensional recurrent neural networks (MDRNNs) have shown a remarkable performance in the area of speech and handwriting recognition. The performance of an MDRNN is improved by further increasing its depth, and the difficulty of learning the deeper network is overcome by using Hessian-free (HF) optimization. Given that connectionist temporal classification (CTC) is utilized as an objective...
متن کاملTraining Neural Networks with Stochastic Hessian-Free Optimization
Hessian-free (HF) optimization has been successfully used for training deep autoencoders and recurrent networks. HF uses the conjugate gradient algorithm to construct update directions through curvature-vector products that can be computed on the same order of time as gradients. In this paper we exploit this property and study stochastic HF with gradient and curvature mini-batches independent o...
متن کاملLearning Recurrent Neural Networks with Hessian-Free Optimization
In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. Utilizing recent advances in the Hessian-free optimization approach (Martens, 2010), together with a novel damping scheme, we successfully train RNNs on two sets of challenging problem...
متن کاملOn the Efficiency of Recurrent Neural Network Optimization Algorithms
This study compares the sequential and parallel efficiency of training Recurrent Neural Networks (RNNs) with Hessian-free optimization versus a gradient descent variant. Experiments are performed using the long short term memory (LSTM) architecture and the newly proposed multiplicative LSTM (mLSTM) architecture. Results demonstrate a number of insights into these architectures and optimization ...
متن کاملOn the importance of initialization and momentum in deep learning
Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train...
متن کامل